134 research outputs found
TIR 2015 Workshop Preface
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record
TU Graz: Course: 707.000 Web Science and Web Technology: Lecture 10: Text Mining
This class introduces basics of web mining and information retrieval including, for example, an introduction to the Vector Space Model and Text Mining.
Guest Lecturer: Dr. Michael Granitzer
Optional: Modeling the Internet and the Web: Probabilistic Methods and Algorithms, Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003 (Chapter 4, Text Analysis
Towards a Feature-Rich Data Set for Personalized Access to Long-Tail Content
Personalized data access has become one of the core challenges for intelligent information access, especially for non-
mainstream long-tail content, as can be found in digital libraries. One of the main reasons that personalization remains a difficult task is the lack of standardized test corpora.
In this paper we provide a comprehensive analysis of feature
requirements for personalization together with a data collection tool for generating user models and collecting data
for personalization of search and recommender system optimization in the long-tail. Based on the feature analysis, we
provide a feature-rich publicly available data set, covering
web content consumption and creation tasks. Our data set
contains user models for eight users, including performed
tasks, relevant topics for each task, relevance ratings, and
relations between focus text and search queries. Altogether,
the data set consists of 217 tasks, 4562 queries and over
15.000 ratings. On this data we perform automatic query
prediction from web page content, achieving an accuracy of
89% using term identity, capitalization and part-of-speech
tags as features. The results of the feature analysis can
serve as guideline for feature collection for long-tail content
personalization, and the provided data set as a gold standard for learning and evaluation of user models as well as for optimizing recommender or search engines for long-tail
domains
- …